Performance-guarantee gene predictions via spliced alignment.

نویسندگان

  • A A Mironov
  • M A Roytberg
  • P A Pevzner
  • M S Gelfand
چکیده

An important and still unsolved problem in gene prediction is designing an algorithm that not only predicts genes but estimates the quality of individual predictions as well. Since experimental biologists are interested mainly in the reliability of individual predictions (rather than in the average reliability of an algorithm) we attempted to develop a gene recognition algorithm that guarantees a certain quality of predictions. We demonstrate here that the similarity level with a related protein is a reliable quality estimator for the spliced alignment approach to gene recognition. We also study the average performance of the spliced alignment algorithm for different targets on a complete set of human genomic sequences with known relatives and demonstrate that the average performance of the method remains high even for very distant targets. Using plant, fungal, and prokaryotic target proteins for recognition of human genes leads to accurate predictions with 95, 93, and 91% correlation coefficient, respectively. For target proteins with similarity score above 60%, not only the average correlation coefficient is very high (97% and up) but also the quality of individual predictions is guaranteed to be at least 82%. It indicates that for this level of similarity the worst case performance of the spliced alignment algorithm is better than the average case performance of many statistical gene recognition methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GeneSeqer@PlantGDB: Gene structure prediction in plant genomes.

The GeneSeqer@PlantGDB Web server (http://www.plantgdb.org/cgi-bin/GeneSeqer.cgi) provides a gene structure prediction tool tailored for applications to plant genomic sequences. Predictions are based on spliced alignment with source-native ESTs and full-length cDNAs or non-native probes derived from putative homologous genes. The tool is illustrated with applications to refinement of current ge...

متن کامل

Optimal spliced alignment of homologous cDNA to a genomic DNA template

MOTIVATION Supplementary cDNA or EST evidence is often decisive for discriminating between alternative gene predictions derived from computational sequence inspection by any of a number of requisite programs. Without additional experimental effort, this approach must rely on the occurrence of cognate ESTs for the gene under consideration in available, generally incomplete, EST collections for t...

متن کامل

ProMult: Prediction of the Exon–Intron Structure by Spliced Alignment with Several Proteins

All existing similarity-based gene recognition algorithms can use only one protein as a template. The proposed enhancement of the ProFrame algorithm allows one to use structural information about several related proteins in multiple alignment. The new algorithm, named ProMult, was tested on a sample of human genes and demonstrated improved reliability of predictions.

متن کامل

Improving spliced alignment for identification of ortholog groups and multiple CDS alignment

The Spliced Alignment Problem (SAP) that consists in finding an optimal semi-global alignment of a spliced RNA sequence on an unspliced genomic sequence has been largely considered for the prediction and the annotation of gene structures in genomes. Here, we re-visit it for the purpose of identifying CDS ortholog groups within a set of CDS from homologous genes and for computing multiple CDS al...

متن کامل

Faster exon assembly by sparse spliced alignment

Assembling a gene from candidate exons is an important problem in computational biology. Among the most successful approaches to this problem is spliced alignment, proposed by Gelfand et al., which scores different candidate exon chains within a DNA sequence of length m by comparing them to a known related gene sequence of length n, m = Θ(n). Gelfand et al. gave an algorithm for spliced alignme...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genomics

دوره 51 3  شماره 

صفحات  -

تاریخ انتشار 1998